x264 in or1ksim
by julius on Oct 27, 2009 |
julius
Posts: 363 Joined: Jul 1, 2008 Last seen: May 17, 2021 |
||
Hi guys, Building toolchain with newlib and floating point supportYou'll want to make a directory to work in, say under your home path called or32-build or something. We'll also install this new version of the toolchain to the path /opt/or32-newlib which you'll have to create and chmod a+rwx to allow normal users to write to it.
BinutilsDownload the binutils sources, extract them, download the patch, apply it, create a build directory, then configure build and install binutilities.
user@host:~/or32-build$ wget ftp://ocuser:oc@orsoc.se/toolchain/binutils-2.18.50.tar.bz2 PathsIf you have another install of the OpenRISC and it lives in /opt/or32-elf/bin, then I would suggest moving /opt/or32-elf to /opt/or32-uclibc, or something appropriate, and creating a symbolic link to whichever version of the toolchain you wish to use. We are building the current toolchain in /opt/or32-newlib, and we've just installed binutilities there. We'll now create a symbolic link from this path to /opt/or32-elf allowing us to put just the single path /opt/or32-elf in our PATH variable.
If /opt/or32-elf already exists: user@host:~/or32-build$ sudo mv /opt/or32-elf /opt/or32-uclibc And add the following to the ~/.bashrc file: PATH=$PATH:/opt/or32-elf/bin GCC and newlibWe use newlib here instead of uClibc because it's easier to make standalone apps based on newlib. You'll want to download both GCC and newlib sources, patch them, symlink newlib into GCC's directory, make a build directory, configure, make and install. This section essentially follows the same instructions as outlined on the OpenRISC GNU toolchain newlib install guide.
user@host:~/or32-build$ wget ftp://ftp.gnu.org/gnu/gcc/gcc-4.2.2/gcc-core-4.2.2.tar.bz2 We'll also create a specs file for gcc. For more info checkout the OpenRISC GNU toolchain newlib install guide.
user@host:~/or32-build/b-gcc$ or32-elf-gcc -dumpspecs > /opt/or32-newlib/lib/gcc/or32-elf/4.2.2/specs Edit this file and change the *endfile and *link sections to look like the following:
*endfile: or1ksimThis is a little simpler, just one download and patch and install.
user@host:~/or32-build$ wget ftp://ocuser:oc@orsoc.se/toolchain/or1ksim-0.3.0.tar.bz2 Patching and running x264Here I'll outline how to download, patch and run x264 in or1ksim. I've created a patch based on git revision e381f6d of x264. Here's how to checkout the x264 repository and revert to that revision. Obtain x264 sources and set revision
user@host:or32-x264$ git clone git://git.videolan.org/x264.git You should see a message saying HEAD is now at e381f6d. We now want to get the patch and apply it. If you have the OpenCores h264 project repository already checked out, just do an svn update in trunk and I've made a path called x264/patches which has the patch. At the time of writing the latest patch file is called x264-e381f6d-or32-or1ksim-with-fp-1.0.patch If you don't have the repository, check it out with: user@host:or32-x264$ svn co http://opencores.org/ocsvn/oc-h264-encoder/oc-h264-encoder Patch x264 sourcesApply the patch to the revision e381f6d x264 sources. user@host:or32-x264/x264$ patch -p1 Configure and build x264The following command should be used to configure the patched x264. Ensure the path to the toolchain we build before is properly setup in your PATH variable. user@host:or32-x264/x264$ ./configure --disable-avis-input --disable-mp4-output --disable-pthread --enable-debug --host=or32-linux --cross-prefix=or32-elf- --extra-cflags="-g -mhard-mul -mhard-div -mhard-float" --extra-ldflags="-Tlink.ld" Now a simple make would do, but I've configured the Makefile to include some raw YUV data in a section of the resulting ELF that we run in the simulator. So, we must specify where a h264 bytestream is. Download one of the sample CIF files from here: http://www.tkn.tu-berlin.de/research/evalvid/cif.html. In my example I'll download the Foreman video. This is then turned into YUV frames with the program ffmpeg, ensure that is installed too. user@host:or32-x264$ wget http://www.tkn.tu-berlin.de/research/evalvid/cif/foreman_cif.264 Specify the location of this file when calling make with the H264_VIDEO_FILE variable on the command line, or edit the Makefile and set this variable to the appropriate path. user@host:or32-x264/x264$ H264_VIDEO_FILE=../foreman_cif.264 make The make process will generate some YUV data, about 5MB big (30 frames of CIF 4:2:0) which will be linked into the resulting ELF. It also pre-calculates a large array of values that it used to take a long time to do in software each time it started, so to save a lot of time I've created a bash script which generates this array and saves it in C format. However,this can still take a long time, but it only gets done once, and at the speed of your host computer, not at the speed of a simulated OR1k. However, you'll notice when x264 is run, it still takes a number of seconds at the beginning in the function x264_analyse_init_costs() generating a large array for each value of lambda. If there's some way around this anyone is aware of please let us know so we can make this step more efficient. Run x264 in or1ksimFinally! We get to run x264 in or1ksim. There is a rule to make and run the executable in or1ksim, just do make sim, but to do it manually you can do the following: user@host:or32-x264/x264$ or32-elf-sim -f or1ksim_x264.cfg x264 This is using the included or1ksim configuration file, or1ksim_x264.cfg You should see or1ksim boot, then the application start, there's some debugging output still in there, but then you'll see a line for each frame it encodes, before finishing and printing out the stats of the encode. Configuring x264Since we're running this standalone, the encoding configuration is hardcoded in the application. I currently have it set to the lowest quality, the ultrafast preset. Look in x264.c in the main() function where the param struct is being altered, this determines what the encoder does. The comments indicate which set of assignments is doing what. I took the ultrafast preset settings from the Parse() function, look there for other presets if you're interested in playing around with it. If changing the resolution of the file, (from CIF to QCIF, or larger even) be sure to set the param.i_width and param.i_height parameters here too. Perhaps these should be set with a define or something. Why is this useful?This gives us a platform to now easily investigate potential acceleration blocks by implementing them as a C model first in or1ksim. I should think we'd develop the OpenRISC port of the x264 software as we go, resulting in software that should be ready to run on the hardware once the accelerator blocks are finalised and tested in the architectural sim. Of course, once the models of the accelerators are finished the tricky part is coding the RTL, however I think once you have a good idea of what the block should do from a C model a lot of the tedium of dealing with HDL compilers and slow simulations is eliminated. Why bother with floating point for the or1200 - we're just going to add accelration blocks anyway, right?I found the simulation, when using the software floating point libraries, to be incredibly slow. When I finally got single precision floating point stuff enabled it went a lot faster. We can't do double precision floating point on the 32-bit incarnation of the OpenRISC, that's a 64-bit only thing unfortunately. Perhaps as we slowly offload the bulk of the calculation onto accelerator blocks and the CPU does less and less floating point, perhaps we'll switch it off, yielding us a saving on hardware, but right now in the architectural simulator it makes it a lot faster. To doThere are many things to do regarding this software model.
I hope this is useful. Julius |
RE: x264 in or1ksim
by gil_savir on Oct 27, 2009 |
gil_savir
Posts: 59 Joined: Dec 7, 2008 Last seen: May 10, 2021 |
||
great job, Julius!!!
should we develop the rtl on ORPSOC (in openRISC 1000 project page) environment? |
RE: x264 in or1ksim
by julius on Oct 28, 2009 |
julius
Posts: 363 Joined: Jul 1, 2008 Last seen: May 17, 2021 |
||
should we develop the rtl on ORPSOC environment? Yes, I think we'll take a lot of what ORPSoC has, in terms of scripts, benches and RTL and setup a copy in the h264 project repository. We'll probably want to replace the arbiter with something we can alter because at the moment it's just some synthesised netlist. Julius |
RE: x264 in or1ksim
by ethanli on Nov 3, 2009 |
ethanli
Posts: 9 Joined: Sep 19, 2008 Last seen: Apr 27, 2012 |
||
Hi,
I followed the instruction to build x264. Everything seems good but the following Edit this file and change the *endfile and *link sections to look like the following: *endfile: -lor32 -lc -lgcc -lc -lor32 *link: /opt/or32-newlib/or32-elf/lib/or32.ld Question 1: What is the purpose for these two parts? Question 2: I didn't find "or32.ld" in my building directory. Does this link script have difference with the default link script? Thanks, Ethan |
RE: x264 in or1ksim
by julius on Nov 3, 2009 |
julius
Posts: 363 Joined: Jul 1, 2008 Last seen: May 17, 2021 |
||
Question 1:
What is the purpose for these two parts? From the newlib part of the OpenRISC GNU toolchain page: http://opencores.org/openrisc,gnu_toolchain#newlib : The *endfile section specifies object files to include at the end of the link command. Here we specify certain libraries and the order in which they should be used. They are in this order, from left to right becuase libor32 requires things from the newlib libc, which requires things from the inbuilt libgcc, which utlimately needs a couple of calls (sbrk, exit) from libor32. The *link section is for passing options to the linker, but we use it to simply specify which linking script to use.
Question 2:
I didn't find "or32.ld" in my building directory. Does this link script have difference with the default link script? When you do "make install" of gcc that was built with newlib, this linker script should get installed automatically to $(PATH_YOU_INSTALLED_THE_TOOLS_TO)/or32-elf/lib . The file is located in the newlib source at newlib-1.17.0/libgloss/or32/or32.ld . If you alter any of the or32 newlib port's files and do a new "make install" this script in newlib-1.17.0/libgloss/or32/or32.ld gets re-written to wherever it got installed to. If you alter it in the install path, be wary of this. Julius |
RE: x264 in or1ksim
by ethanli on Nov 12, 2009 |
ethanli
Posts: 9 Joined: Sep 19, 2008 Last seen: Apr 27, 2012 |
||
Thanks for your reply.
After several retries, finally I got everything working. There are several tricks for me. 1. ln -s newlib-1.17.0/libgloss gcc-4.2.2/libgloss ln -s newlib-1.17.0/newlib gcc-4.2.2/newlib These commands are not working for me. I have to do, cd newlib-1.17.0 ln -s libgloss ../gcc-4.2.2/libgloss ln -s newlib ../gcc-4.2.2/newlib 2. Install ffmpeg without root previlage follow the link http://dev.gemin-i.org/wiki/index.php/Ffmpeg_install_instructions and run the following configure CFLAGS="-L/home/ethan/lame-398-2/lib -I/home/yili/lame-398-2/include" LDFLAGS="-L/home/ethan/lame-398-2/lib" ./configure --enable-libmp3lame --enable-libvorbis --disable-mmx --enable-shared --disable-demuxer=v4l --disable-demuxer=v4l2 --disable-indev=v4l --disable-indev=v4l2 --enable-cross-compile So far, I still didn't know if my build is correct or not. Can you tell me how to output the encoded stream into a file? So I can run it to check if my build correct or not. Thanks, |
RE: x264 in or1ksim
by julius on Nov 13, 2009 |
julius
Posts: 363 Joined: Jul 1, 2008 Last seen: May 17, 2021 |
||
Can you tell me how to output the encoded stream into a file?
I'll put up a new patch for x264 this weekend - I have put in printf at the end which lets the user know how much was written to memory, and where. From GDB you can then do a binary dump out to a file and check it. I'll do up a post with info in a day or so. The ability to verify that what we're doing isn't breaking or altering the output is important, so perhaps I'll find a way to checksum the encoder's output while it's in memory rather than doing this manual dump. Julius |
RE: x264 in or1ksim
by julius on Nov 15, 2009 |
julius
Posts: 363 Joined: Jul 1, 2008 Last seen: May 17, 2021 |
||
I've put up a new patch, x264-e381f6d-or32-or1ksim-with-fp-1.1.patch, in the trunk/x264/patches. It's just a little update, printing out the location of the data when it finishes encoding. Other changes include the configuration to use the fast preset now with no buffer lookahead (more appropriate for a streaming application) and a VBV buffer of only 4 frames big (I think this is useful?!) Patch x264 sourcesApply the patch to the revision e381f6d x264 sources. (See the original post for information about obtaining the source and setting the x264 revision.) user@host:or32-x264/x264$ patch -p1 Configure and build x264The following command should be used to configure the patched x264. Ensure the path to the toolchain we build before is properly setup in your PATH variable. user@host:or32-x264/x264$ ./configure --disable-avis-input --disable-mp4-output --disable-pthread --enable-debug --host=or32-linux --cross-prefix=or32-elf- --extra-cflags="-g -mhard-mul -mhard-div -mhard-float" --extra-ldflags="-Tlink.ld" Now a simple make would do, but I've configured the Makefile to include some raw YUV data in a section of the resulting ELF that we run in the simulator. So, we must specify where some sample video data is. Download one of the sample CIF files from here: http://www.tkn.tu-berlin.de/research/evalvid/cif.html. In my example I'll download the Foreman video. This is then turned into YUV frames with the program ffmpeg, ensure that is installed too. user@host:or32-x264$ wget http://www.tkn.tu-berlin.de/research/evalvid/cif/foreman_cif.264 Specify the location of this file when calling make with the INPUT_VIDEO_FILE variable on the command line, or edit the Makefile and set this variable to the appropriate path. user@host:or32-x264/x264$ INPUT_VIDEO_FILE=../foreman_cif.264 make The make process will generate some YUV data, about 5MB big (30 frames of CIF 4:2:0 YUV) which will be linked into the resulting ELF. It also pre-calculates a large array of values that it used to take a long time to do in software each time it started, so to save a lot of time I've created a bash script which generates this array and saves it in C format. However,this can still take a long time, but it only gets done once, and at the speed of your host computer, not at the speed of a simulated OR1k. However, you'll notice when x264 is run, it still takes a number of seconds at the beginning in the function x264_analyse_init_costs() generating a large array for each value of lambda. If there's some way around this anyone is aware of please let us know so we can make this step more efficient. Run x264 in or1ksim and connect with GDBFinally! We get to run x264 in or1ksim. There is a rule to make and run the executable in or1ksim, just do make sim, but to do it manually you can do the following: user@host:or32-x264/x264$ or32-elf-sim -f rsp_or1ksim_x264.cfg x264 This is using the included or1ksim configuration file, rsp_or1ksim_x264.cfg, which starts the simulator and then waits for GDB to connect. You should see or1ksim boot, then wait for a connection from the debugger. I've hardcoded or1ksim to listen for GDB on port 5554, but you can change this in the or1ksim config file. In another console window we want to start up GDB. Run the OpenRISC-compatible GDB (is usually built and installed with the OpenRISC GNU toolchain port), this is potentially in another OpenRISC toolchain path, it's fine to just copy it to your other toolchain path's bin directory or directly call it.
user@host:or32-x264/x264$ or32-elf-gdb x264 Now you'll be at the GDB prompt. The following list of commands connects to the or1ksim, and sets it running to the end of the main() function, where it will break and we can then dump the file.
(gdb) target remote localhost:5554 Or1ksim should have printed out a line like the following: close_file_bsf: wrote 81115 bytes from 0x01d59c00 to 0x01d6d8db This indicates where the encoded video bytestream is in the system. We'll dump this out with the following command in GDB: (gdb) dump binary memory sim_dump.264 0x01d59c00 0x01d6d8db This will create the file sim_dump.264 containing the contents of the memory boundaries we gave. Just tell GDB to continue once more and the code will make the simulator finish and shut down.
(gdb) c The video can be played with ffplay like so:
user@host:or32-x264/x264$ ffplay -s cif sim_dump.264 It's only thirty frames, but it confirms that the or1ksim is doing it's job. To do
Julius |
RE: x264 in or1ksim
by julius on Nov 17, 2009 |
julius
Posts: 363 Joined: Jul 1, 2008 Last seen: May 17, 2021 |
||
Example hardware module model in or1ksimI've taken the SAD and SSD loops from the x264 software and implemented them in "hardware". Performance wise I don't think this is particularly useful. I have done it just as an example of how to write a new module for or1ksim, the OpenRISC architectural simulator, and then use it from the x264 software. An attempt was made to make the performance of the module somewhat representative of what it might be like in a hardware implementation, by trying to provide accurate cycle counts of its computation. New patchesThere's a couple new patches up in the repository. One is a patch for or1ksim-0.3.0, implementing the example module, and one is a patch for x264, putting in code which uses the new module. Check out the repository, and the READMEs in the respective patch directories for or1ksim and x264. Example SAD/SSD moduleI found the SAD and SSD algorithms to be very simple, and similar, so I thought they were a good choice to implement in a single module. Register interfaceThe functions in the C code (common/pixel.c) are generated with defines, one for each of SAD and SSD, and for each block size of 16x16,16x8,8x16,8x8,8x4,4x8 and 4x4. The function is passed pointers to the two sets of block data (top left pixel) and the strides to the next row. The functions then return a single integer value, the SAD or SSD value for that particular block. The register interface is very simple. A register each for the function parameters, plus one for control, plus one for returning the result. I used the following struct:
typedef struct { One bit in the control register indicates if we should do SAD/SSD, and another bit indicates when the module should should start processing. This start/busy bit is cleared by the module when it is finished. The software, after setting this bit, should poll it until it goes low. A very simple interface. Of course the result is stored in the so-called register and this is read by the software after the busy bit goes low in the control register. The patch implementing this in the software (x264-e381f6d-or32-or1ksim-with-fp-1.2.patch) modifies the defined SAD/SSD functions to use this module instead of doing it on the processor. See the file common/pixel.c in the patched x264 sources. The define OR32_SADSSDMOD (defined in common/or32/or32.h) controls whether this module is used or not. Implementing SAD/SSD module in or1ksimHere we look at building this simple module in or1ksim. How or1ksim works, setting up the moduleI hadn't worked too much with or1ksim before, and wasn't aware of the internal structure. It turns out it's very straight forward and easy to use. A good example of a simple generic module is described in peripherals/generic.c in the or1ksim source. Each module provides a few required, and many options functions. First is some sort of constructor (generic_sec_start() in peripheral/generic.c) which is called when or1ksim parses the config file and notices the start of a new section, as well as a function to properly instantiate the module when it has finished parsing the config file (generic_sec_end()). Others include some functions allowing access to configurable parameters (generic_name(), generic_size(), generic_baseaddr() etc.), and a function which registers these constructor/setter/getter functions so the main simulator knows how to call them (reg_generic_sec()). I'm using object-orientated terms here, but or1ksim is written entirely in C, not C++. This function which registers the module's own functions must then be inserted into the simulator's startup routine, where it expects such registering of functions to be called. This function is reg_config_specs() in sim-config.c in or1ksim. You'll notice all other modules and peripherals have their section functions included there too. So that's all good, but we're mainly interested in being memory-mapped accessible from the processor, and being able to access the system memory from our module. Memory accessesThe memory subsystem of or1ksim requires each new module to provide it with information about which kind of accesses it supports (read and or write, 8, 16 and 32-bit wide accesses), its base address and its address size/space. In the function generic_sec_end() you can see that it checks which capabilities have been enabled (access size configurable only, not ro/rw/wo) and sets up a struct mem_ops variable accordingly. It then passes this struct, along with the module's base address and size (span of addresses) to the memory management system via the reg_mem_area() function. The simulator then knows where, how wide and what kind of accesses the module supports, making it accessible from the processor. For memory accesses from the module to the rest of the system, the simulator provides a set of functions for reading and writing each of 8, 16 and 32-bit wide values. The read functions are eval_direct8(), eval_direct16() and eval_direct32(). The write functions are set_direct8(), set_direct16(), and set_direct32(). The first parameter is the address, the second for write functions is the value, and the last two of each are whether to go through the cache or MMU, used for statistical purposes only, so just ignore them in this case. Setting up the SAD/SSD moduleIn the new patch for or1ksim implementing this module (to be applied ontop of the existing or1ksim patch enabling floating point capability) this module's code can be found in the video_enc/x264_sadssdmod.c file. I chose only to have full 32-bit word read/write capability, and the only options which are configurable by the config file at runtime are the module's base address, name and whether or not it's enabled. I've declared the same struct for its registers as in x264 at the top of this file, and made them accessible via the x264_sadssdmod_read_word() and x264_sadssmod_write_word() functions. How do you get it to do things?Along with registering the module's configuration functions and it's memory settings, you also register a reset function. This reset function, x264_sadssdmod_reset() in this case, is called once before simulation begins. With the use of the SCHED_ADD(*function, void* data, int num_cycles) macro, we can schedule for a particular function to be called and passed a pointer to some data after num_cycles simulated clock cycles. Using this feature we can then insert a hook for our function which does stuff. In this case, the stuff is the desired behavior of the module: monitor the module's registers, and react whenever the busy bit is set in the control register, performing either SAD or SSD calculations for the given parameters, leaving the calculated value in the result register. The job function is registered like so: SCHED_ADD (x264_sadssdmod_job, dat,1) in the reset function, and is then called on the next cycle. Implementing the SAD/SSD algorithmThe algorithm is very simple, in the case of SAD we accumulate the difference between each pixel in the current and reference block, and in SSD we square this difference and accumulate that instead. Although it's not perfect, an attempt was made to code the module like an FSM in hardware. I initialise the module so that the job function is called each clock cycle. I've tried to make it do approximately one clock cycle's worth of work, before re-scheduling itself and returning. All we're really doing once we get the go signal is a couple of for loops, so it was relatively easy to implement. So for each cycle where the module is activated and processing, I do one step of the algorithm. The following is the step for the SAD algorithm:
dev->regs->result += Some state variables, updated each cycle, allow us to track where we are in the loops. You can see here, the x_count state variable holds how far through the inner for loop we are. It's probably not accurate to say this could occur in a single cycle, it would probably take several due to the two memory accesses, the subtraction, the absolute value calculation and the addition/accumulation. This can be tuned by changing the number of cycles we schedule before calling the function again. Finally when the for loops are complete, the last thing the module does is clear the busy bit. The processor then reads this at some point after that, and reads the result register. The module returns to polling the busy bit, waiting for it to be asserted again. ResultsWhen encoding 5 frames of CIF, using the exact same simulator configuration, doing SAD/SSD on the processor takes 37,133,265,830 cycles, and when using the hardware module the simulator reports having done only 28,249,742,100 cycles, or 25% fewer cycles. I did admit though, that this is optimisitic, as the hardware module model doesn't accurately represent the cycles taken for the step where it does the actual calculation. Plus I think there's some tuning of the simulator as a whole to be done here. Using those cycle numbers, on a processor, even running at 100Mhz, it would take 37 seconds to encode just 5 frames. I'm yet to run this in hardwre, but I don't think that is right. Although there is one thing which is annoying, and that is the calculation of the a big cost vector in the x264_analyse_init_costs() function at the beginning which takes more than half the time I think. Other or1ksim modulesThe main idea of this was to show how to use or1ksim to model any potential hardware modules implemented to speedup x264. They can be processor independent and interfaced via a simple method. Interrupts could also be used for multiple modules running in parallel, for instance. I think some useful tests of potential sw/hw partitioning can be done using this method. The tuning of the hardware module is important to get somewhat accurate performance impact results. I hope this is useful. Julius |
RE: x264 in or1ksim
by ethanli on Nov 20, 2009 |
ethanli
Posts: 9 Joined: Sep 19, 2008 Last seen: Apr 27, 2012 |
||
I tried patch 1.1 and connected with GDB and run x264. But I got odd error.
Listening for RSP on port 5554 Remote debugging from host 0.0.0.0 get_frame_total_yuv: 352 288 4561920 30 x264 [info]: using cpu capabilities: none! x264_analyse_init_costs: lambda (= x264_lambda_tab[10]) = 1 x264_analyse_init_costs: lambda (= x264_lambda_tab[16]) = 2 x264_analyse_init_costs: lambda (= x264_lambda_tab[20]) = 3 x264_analyse_init_costs: lambda (= x264_lambda_tab[23]) = 4 x264_analyse_init_costs: lambda (= x264_lambda_tab[26]) = 5 x264_analyse_init_costs: lambda (= x264_lambda_tab[27]) = 6 x264_analyse_init_costs: lambda (= x264_lambda_tab[29]) = 7 x264_analyse_init_costs: lambda (= x264_lambda_tab[30]) = 8 x264_analyse_init_costs: lambda (= x264_lambda_tab[31]) = 9 x264_analyse_init_costs: lambda (= x264_lambda_tab[32]) = 10 x264_analyse_init_costs: lambda (= x264_lambda_tab[33]) = 11 x264_analyse_init_costs: lambda (= x264_lambda_tab[34]) = 13 x264_analyse_init_costs: lambda (= x264_lambda_tab[35]) = 14 x264_analyse_init_costs: lambda (= x264_lambda_tab[36]) = 16 x264_analyse_init_costs: lambda (= x264_lambda_tab[37]) = 18 x264_analyse_init_costs: lambda (= x264_lambda_tab[38]) = 20 x264_analyse_init_costs: lambda (= x264_lambda_tab[39]) = 23 x264_analyse_init_costs: lambda (= x264_lambda_tab[40]) = 25 x264_analyse_init_costs: lambda (= x264_lambda_tab[41]) = 29 x264_analyse_init_costs: lambda (= x264_lambda_tab[42]) = 32 x264_analyse_init_costs: lambda (= x264_lambda_tab[43]) = 36 x264_analyse_init_costs: lambda (= x264_lambda_tab[44]) = 40 x264_analyse_init_costs: lambda (= x264_lambda_tab[45]) = 45 x264_analyse_init_costs: lambda (= x264_lambda_tab[46]) = 51 x264_analyse_init_costs: lambda (= x264_lambda_tab[47]) = 57 x264_analyse_init_costs: lambda (= x264_lambda_tab[48]) = 64 x264_analyse_init_costs: lambda (= x264_lambda_tab[49]) = 72 x264_analyse_init_costs: lambda (= x264_lambda_tab[50]) = 81 x264_analyse_init_costs: lambda (= x264_lambda_tab[51]) = 91 x264 [debug]: VBV maxrate unspecified, assuming CBR x264 [info]: profile Baseline, level 3.0 x264 [debug]: frame= 0 QP=27.04 NAL=3 Slice:I Poc:0 I:396 P:0 SKIP:0 size=8996 bytes x264 [debug]: frame= 1 QP=30.75 NAL=2 Slice:P Poc:2 I:3 P:195 SKIP:198 size=626 bytes x264 [error]: malloc of size 585728 failed x264 [error]: x264_encoder_encode failed I don't believe my server run out of memory. It just needs around 600K. Someones said ffmpeg cause this problem. |
RE: x264 in or1ksim
by julius on Nov 20, 2009 |
julius
Posts: 363 Joined: Jul 1, 2008 Last seen: May 17, 2021 |
||
I don't believe my server run out of memory.
Don't worry, it's the or1ksim's simulated memory that is running short here, not your actual system. How big is the yuv_data.elf that gets generated? Maybe for some reason it's too big and using up all the RAM in the system? Although I don't think this is possible due to the linker script defining where it can go. Are there any other modifications you've done to the x264 code or the code the patch adds? Are you sure you're running or1ksim and specifying the right simulation file with the -f option? Julius |
RE: x264 in or1ksim
by ethanli on Nov 23, 2009 |
ethanli
Posts: 9 Joined: Sep 19, 2008 Last seen: Apr 27, 2012 |
||
How big is the yuv_data.elf that gets generated? Maybe for some reason it's too big and using up all the RAM in the system? Although I don't think this is possible due to the linker script defining where it can go. 4562381 Nov 23 14:49 yuv_data.elf Are there any other modifications you've done to the x264 code or the code the patch adds? Are you sure you're running or1ksim and specifying the right simulation file with the -f option? No. Followed the post exactly. But or32-elf-gdb is from my uclibc, since I didn't find it fromy my newlib. It is a client, probably should be OK. Even though I changed the config by increasing memory size, same problem. The error is really from malloc during my debugging. |
RE: x264 in or1ksim
by kahomike on Nov 29, 2009 |
kahomike
Posts: 4 Joined: Aug 22, 2009 Last seen: Dec 14, 2009 |
||
Dear Julius:
Thanks a lot for your great work! I can compile and run x264 in or1ksim. However, whenever I want to increase the number of reference frames used, e.g. when I set param.i_frame_reference = 2; in x264.c I get an error and the encoding ends: x264 [error]: malloc of size 585728 failed x264 [error]: x264_encoder_encode failed exit(-1) May I ask how can this problem be solved? Thanks a lot if you can help. Regards, Mike |
RE: x264 in or1ksim
by julius on Nov 29, 2009 |
julius
Posts: 363 Joined: Jul 1, 2008 Last seen: May 17, 2021 |
||
May I ask how can this problem be solved?
This is a good question. I am not seeing this problem. It would be great if you could show me the commands you used to compile x264 and run it. Perhaps you could put this in a text file and attach it to a post (or using pastebin etc.), rather than pasting it all into a post! I have been doing further work and hope to have a new patch out soon which will have some very handy features (exact memory accesses, instructions executed etc.) and a better setup. I aim to get some solid numbers of what kind of performance increases, in cycles, we need to achieve to make this system realisable on FPGA very soon (a few days). But if you could post and attach a log of the compilation and execution of x264 in or1ksim that'd be great. From memory the heap variable is in the sbrk() function, which is part of newlib, and sounds to me like there's not enough memory being allocated by the linker. It's strange that I'm not seeing this. But sometimes when creating the patches I can make a mistake and not include a file like a linker script or something which results in subtle and hard to trace problems. Julius |
RE: x264 in or1ksim
by kahomike on Nov 30, 2009 |
kahomike
Posts: 4 Joined: Aug 22, 2009 Last seen: Dec 14, 2009 |
||
Dear Julius:
Thanks for your prompt reply. I build the toolchain and patch x264 following exactly your post on Oct 27, 2009, i.e. I uses the patch x264-e381f6d-or32-or1ksim-with-fp-1.0.patch Attached compile_and_run.txt contains output of compile and run of x264. The other configs I modified in x264.c are: param.analyse.intra = X264_ANALYSE_I8x8; param.analyse.inter = X264_ANALYSE_PSUB16x16; param.analyse.i_subpel_refine = 1; but they are ok with param.i_frame_reference = 1 or 0; For other param.i_frame_reference values, they all lead to the same error: x264 [error]: malloc of size 585728 failed (as shown in compile_and_run.txt) Using other sequences, e.g. Akiyo, Stefan, have the same problem. I also tried to increase the memory size by modifying the or1ksim_x264.cfg: section memory ... name = "RAM" ... baseaddr = 0x00000000 size = 0x10000000 // original is size = 0x02000000 ... end but the problem still exists. Regards, Mike
compile_and_run.txt (16 kb)
|